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Abstract 

We introduce a simple model illustrating the role of context in communication and the 
challenge posed by uncertainty of knowledge of context. We consider a variant of distributional 
communication complexity where Alice gets some information x and Bob gets y, where {x, y) is 
drawn from a known distribution, and Bob wishes to compute some function ^(x, y) (with high 
probability over (x, y)). In our variant, Alice does not know g, but only knows some function / 
which is an approximation of g. Thus, the function being computed forms the context for the 
communication, and knowing it imperfectly models (mild) uncertainty in this context. 

A naive solution would be for Alice and Bob to first agree on some common function h that 
is close to both / and g and then use a protocol for h to compute h{x,y). We show that any 
such agreement leads to a large overhead in communication ruling out such a universal solution. 

In contrast, we show that if g has a one-way communication protocol with complexity k in 
the standard setting, then it has a communication protocol with complexity 0{k • (I -|- /)) in the 
uncertain setting, where I denotes the mutual information between x and y. In the particular 
case where the input distribution is a product distribution, the protocol in the uncertain setting 
only incurs a constant factor blow-up in communication and error. 

Furthermore, we show that the dependence on the mutual information / is required. Namely, 
we construct a class of functions along with a non-product distribution over (x, y) for which the 
communication complexity is a single bit in the standard setting but at least P(yn) bits in the 
uncertain setting. 
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1 Introduction 


Most forms of communication involve communicating players that share a large common context 
and use this context to compress communication. In natural settings, the context may include 
understanding of language, and knowledge of the environment and laws. In designed (computer- 
to-computer) settings, the context includes knowledge of the operating system, communication 
protocols, and encoding/decoding mechanisms. Remarkably, especially in the natural setting, con¬ 
text can seemingly be used to compress communication, even when it is not shared perfectly. This 
ability to communicate despite a major source of uncertainty has led to a series of works attempt¬ 
ing to model various forms of communication amid uncertainty, starting with Goldreich, Juba and 
Sudan [JS08, GJS12] followed by [JKKSll, JSll, JW13, HS14, CGMS15]. This current work in¬ 
troduces a new theme to this series of works by introducing a functional notion of uncertainty and 
studying this model. We start by describing our model and results below and then contrast our 
model with some of the previous works. 

Model. Our model builds upon the classical setup of communication complexity due to Yao [Yao79] , 
and we develop it here. The classical model considers two interacting players Alice and Bob each 
possessing some private information x and y with x known only to Alice and y to Bob. They wish to 
compute some joint function g{x, y) and would like to do so while exchanging the minimum possible 
number of bits. In this work, we suggest that the function g is the context of the communication and 
consider a setting where it is shared imperfectly. Specifically, we say that Bob knows the function 
g and Alice knows some approximation f to g (with / not being known to Bob). This leads to the 
question: when can Alice and Bob interact to compute g{x, y) with limited communication ? 

It is clear that if x € {0,1}"", then n bits of communication suffice — Alice can simply ignore / 
and send x to Bob. We wish to consider settings that improve on this. To do so correctly on every 
input, a necessary condition is that g must have low communication complexity in the standard 
model. However, this necessary condition does not seem to be sufficient — since Alice only has 
an approximation / to g. Thus, we settle for a weaker goal: determining g correctly only on most 
inputs. This puts us in a distributional communication complexity setting. A necessary condition 
now is that g must have a low-error low-communication protocol in the standard setting. The 
question is then: can g be computed with low error and low communication when Alice only knows 
an approximation f to g (with / being unknown to Bob) ? 

More precisely, in this setting, the input to Alice is a pair (/, x) and the input to Bob is a pair 
{g,y). The functions {f,g) are adversarially chosen subject to the restrictions that they are close 
to each other (under some distribution g, on the inputs) and that g (and hence /) has a low-error 
low-communication protocol. The pair (x, y) is drawn from the distribution g (independent of the 
choice of / and g). The players both know g in addition to their respective inputs. 

Results. In order to describe our results, we first introduce some notation. Let g) denote the 
(weighted and normalized) Hamming distance between / and g with respect to the distribution g. 
Let CC^{f) denote the minimum communication complexity of a protocol computing / correctly on 
all but an e fraction of the inputs. Let owCC^(f) denote the corresponding one-way communication 
complexity of /. Given a family T” of pairs of functions {f,g), we denote the uncertain complexity 
CCU(f(J-') to be the minimum over all public-coin protocols H of the maximum over {f,g) G T", 
(x, y) in the support of g and settings of public coins, of the communication cost of H, subject to 
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the condition that for every {f,g) G 11 outputs g{x,y) with probability 1 — e over the choice of 
(x, y) and the shared randomness. That is, 


CCU^(.F) ^ 


mm 


max 


{n I (S^(n,g)<e} {{f,g)eT,{x,y)esupp{ti), public coins} 


{Comm, cost of Il{{f, x), {g, y))}. 


Similarly, let owCCUg (J^) denote the one-way uncertain communication complexity of 

Our first result (Theorem 1.1) shows that if /r is a distribution on which / and g are close and 
each has a one-way protocol with communication k bits in the standard model, then the pair (/, g) 
has one-way uncertain communication complexity of at most 0{k ■ (1 -|- I)) bits with I being the 
mutual information^ of (x, y) ~ fi. More precisely, let owTk,e,s denote the family of all pairs of 
functions {f,g) with owCC(‘(/), owCC(‘( 5 ) < k and Sf^{f,g) < 6. We prove the following theorem. 


Theorem 1.1. There exists an absolute constant c such that for every pair of finite sets X and Y, 
every distribution g over X x Y and every 6 > 0, it holds that 


owCCU^+25+0 


_ rl < 


c(/c + log(i)) 
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In the special case where g is a product distribution, then I{X;Y) = 0 and we obtain the 
following particularly interesting corollary of Theorem 1.1. 


Corollary 1.2. There exists an absolute constant c such that for every pair of finite sets X and 
Y, every product distribution g over X x Y and every 9 > 0, it holds that 


owCCU^+ 25 +g 


{owX‘ 


c(A: + log(i)) 




02 


In words. Corollary 1.2 says that for product distributions and for constant error probabilities, 
communication in the uncertain model is only a constant factor larger than in the standard model. 

Our result is significant in that it achieves (moderately) reliable communication despite un¬ 
certainty about the context, even when the uncertainty itself is hard to resolve. To elaborate on 
this statement, note that one hope for achieving a low-communication protocol for g would be for 
Alice and Bob to first agree on some function q that is close to / and g, and then apply some low- 
communication protocol for this common function q. This would be the “resolve the uncertainty 
first” approach. We prove (Theorem 3.2) that resolving the uncertainty can be very expensive 
(much more so than even the trivial protocol of sending x) and hence, this would not be a way to 
prove Theorem 1.1. Instead, we show a path around the inherent uncertainty to computing the 
desired function, and this leads to a proof of Theorem 1.1. To handle non-product distributions 
in Theorem 1.1, we in particular use a one-way distributional variant of the correlated sampling 
protocol of Braverman and Rao [BRll]. For a high-level overview of the proof of Theorem 1.1, we 
refer the reader to Section 4.1. 

We now describe our lower bound. Given the upper bound in Theorem 1.1, a natural question 
is whether the dependence on I{X;Y) in the right-hand side of Equation (1) is actually needed. In 
other words, is it also the case that for non-product distributions, contextual uncertainty can only 
cause a constant-factor blow-up in communication (for constant error probabilities) ? Perhaps sur¬ 
prisingly, the answer to this question turns out to be negative. Namely, we show that a dependence 
of the communication in the uncertain setting on I{X;Y) is required. 

^ Given a distribution /r over a pair (X, Y) of random variables with marginals fix and fiy over X and Y respectively, 
the mutual information of X and Y is defined as I{X\Y) — lE( 3 ;,y)~;j[log( 
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Theorem 1.3. 

e > 0, 


There exist a distribution fj, and a function class T C ^ such that for every 


CCU^_ (T) > n{V^) - log(l/e). 
2 ^ 


In particular, if 6 is any small constant (e.g., 1/5), then Theorem 1.3 asserts the existence of a 
distribution and a class of distance-(5 functions for which the zero-error (one-way) communication 
complexity in the standard model is a single bit, but under contextual uncertainty, any two-way 
protocol (with an arbitrary number of rounds of interaction) having a noticeable advantage over 
random guessing requires Q(y/n) bits of communication! We note that the distribution in The¬ 
orem 1.3 has mutual information ~ n, so Theorem 1.3 rules out improving the dependence on the 
mutual information in Equation (1) to anything smaller than y^I(X;Y). It is an interesting open 
question to determine the correct exponent of I{X;Y) in Equation (1). 

In order to prove Theorem 1.3, the function class X will essentially consist of the set of all close- 
by pairs of parity functions and the distribution // will correspond to the noisy Boolean hypercube. 
We are then able to reduce the problem of computing X under /r with contextual uncertainty, to the 
problem of computing a related function in the standard distributional communication complexity 
model (i.e., without uncertainty) under a related distribution. We then use the discrepancy method 
to prove a lower bound on the communication complexity of the new problem. This task itself 
reduces to upper bounding the spectral norm of a certain communication matrix. The choice of our 
underlying distribution then implies a tensor structure for this matrix, which reduces the spectral 
norm computation to bounding the largest singular value of an explicit family of 4 x 4 matrices. 
Eor more details about the proof of Theorem 1.3, we refer the reader to Section 5. 


Contrast with prior work. The first works to consider communication with uncertainty in a 
manner similar to this work were those of [JS08, GJS12]. Their goal was to model an extreme form of 
uncertainty, where Alice and Bob do not have any prior (known) commonality in context and indeed 
both come with their own “protocol” which tells them how to communicate. So communication is 
needed even to resolve this uncertainty. While their setting is thus very broad, the solutions they 
propose are much slower and typically involve resolving the uncertainty as a first step. 

The later works [JKKSll, HS14, CGMS15] tried to restrict the forms of uncertainty to see 
when it could lead to more efficient communication solutions. Eor instance, Juba et al. [JKKSll] 
consider the compression problem when Alice and Bob do not completely agree on the prior. This 
introduces some uncertainty in the beliefs, and they provide fairly efficient solutions by restricting 
the uncertainty to a manageable form. Canonne et al. [GGMS15] were the first to connect this 
stream of work to communication complexity, which seems to be the right umbrella to study the 
broader communication problems. The imperfectness they study is however restricted to the ran¬ 
domness shared by the communicating parties, and does not incorporate any other elements. They 
suggest studying imperfect understanding of the function being computed as a general direction, 
though they do not suggest specific definitions, which we in particular do in this work. 

Organization In Section 2, we carefully develop the uncertain communication complexity model 
after recalling the standard distributional communication complexity model. In Section 3, we prove 
the hardness of contextual agreement. In Section 4, we prove our main upper bound (Theorem 1.1). 
In Section 5, we prove our main lower bound (Theorem 1.3). For a discussion of some intriguing 
future directions that arise from this work, we refer the reader to the conclusion section 6. 
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2 The Uncertain Communication Complexity Model 


We start by recalling the classical communication complexity model of Yao [Yao79] and then present 
our definition and measures. 

2.1 Communication Complexity 

We start with some basic notation. For an integer n G N, we denote by [n] the set {1,... ,n}. 
We use logx to denote a logarithm in base 2. For two sets A and B, we denote by AAB their 
symmetric difference. For a distribution we denote by x ~ /i the process of sampling a value x 
from the distribution /i. Similarly, for a set X we denote by x ~ Y the process of sampling a value 
X from the uniform distribution over X. For any event E, let ^{E) be the 0-1 indicator of E. For 
a probability distribution /x over X x Y, we denote by fix the marginal of jjL over X. By ^y\x-, 
denote the conditional distribution of /x over Y conditioned on Y = x. 

Given a distribution /x supported on Y and functions /,g': Y —)• S, we let S^{f,g) denote the 
(weighted and normalized) Hamming distance between / and g, i.e., Sfi{f,g) = / 9{x)]- 

(Note that this definition extends naturally to probabilitistic functions / and g - hy letting /(x) 
and g{x) be sampled independently.) 

We now turn to the definition of communication complexity. A more thorough introduction can 
be found in [KN97]. Let f: X x Y {0,1} be a function and Alice and Bob be two parties. A 
protocol n between Alice and Bob specifies how and what Alice and Bob communicate given their 
respective inputs and communication thus far. It also specifies when they stop and produce an 
output (that we require to be produced by Bob). A protocol is said to be one-way if it involves 
a single message from Alice to Bob, followed by Bob producing the output. The protocol H is 
said to compute / if for every {x,y) £ X x Y it holds that n(x,y) = f{x,y). The communication 
complexity of H is the number of bits transmitted during the execution of the protocol between 
Alice and Bob. The communication complexity of / is the minimal communication complexity of 
a protocol computing /. 

It is standard to relax the above setting by introducing a distribution /x over the input space 
X xY and requiring the protocol to succeed with high probability (rather than with probability 1). 
We say that a protocol H e-computes a function / under distribution g if (5^(n(x, y),/(x, y)) < e. 

Definition 2.1 (Distributional Communication Complexity). Let f: X xY {0,1} be a Boolean 
function and g be a probability distribution over X x Y. The distributional communication com¬ 
plexity of f under g with error e, denoted by CC^{f), is defined as the minimum over all protocols 
n that e-compute f over g, of the communication complexity of H. The one-way communication 
complexity owCC(‘(/) is defined similarly by minimizing over one-way protocols H. 

We note that it is also standard to provide Alice and Bob with a shared random string which 
is independent of x, y and /. In the distributional communication complexity model, it is a known 
fact that any protocol with shared randomness can be used to get a protocol that does not use 
shared randomness without increasing its distributed communication complexity. 

In this paper, unless stated otherwise, whenever we refer to a protocol, we think of the input 
pair (x, y) as coming from a distribution. 
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2.2 Uncertain Communication Complexity 

We now turn to the central definition of this paper, namely uncertain communication complexity. 
Our goal is to understand how Alice and Bob can communicate when the function that Bob wishes 
to determine is not known to Alice. In this setting, we make the functions g (that Bob wants to 
compute) and / (Alice’s estimate of g) explicitly part of the input to the protocol B. Thus, in this 
setting a protocol 11 specifies how Alice with input (/, x) and Bob with input {g, y) communicate, 
and how they stop and produce an output. We say that IT computes (/, g) if for every (x, y) ^ XxY, 
the protocol outputs g{x,y). We say that a (public-coin) protocol 11 e-computes {f,g) over g if 
S^,{g,U) < e. 

Next, one may be tempted to define the communication complexity of a pair of functions (/, g) 
as the minimum over all protocols that compute {f,g) of their maximum communication. But 
this does not capture the uncertainty! (Rather, a protocol that works for the pair corresponds 
to both Alice and Bob knowing both / and g.) To model uncertainty, we have to consider the 
communication complexity of a whole class of pairs of functions, from which the pair (/, g) is 
chosen (in our case by an adversary). 

Let U C {/: A X T —)• {0,1}}^ be a family of pairs of Boolean functions with domain X xY. 
We say that a public-coin protocol 11 e-computes T over g if for every (/, g) G X, we have that 11 
e-computes (/, g) over g. We are now ready to present our main definition. 

Definition 2.2 (Contextually Uncertain Communication Complexity). Let g be a distribution on 
X xY and X Y {f: X x Y —>■ {0,1}}^. The communication complexity of X under contextual 
uncertainty, denoted CCU(f(T'), is the minimum over all public-coin protocols 11 that e-compute X 
over g, of the maximum communication complexity of H over {f,g) G X, {x,y) from the support 
of g and settings of the public coins. 

As usual, the one-way contextually uncertain communication complexity owCC[j^{X) is defined 
similarly. 

We remark that while in the standard distributional model of Subsection 2.1, shared randomness 
can be assumed without loss of generality, this is not necessarily the case in Definition 2.2. This 
is because in principle, shared randomness can help fool the adversary who is selecting the pair 
{f,g) G X. Also, observe that in the special case where X = {{f,g)}, Definition 2.2 boils down 
to the standard definition of distributional communication complexity (i.e.. Definition 2.1) for the 
function g, and we thus have CCU(f({(/,5)}) = CC'f{g). Furthermore, the uncertain communication 
complexity is monotone, i.e., if X Y X' then CCU(((J-') < CCU(((U). Hence, we conclude that 
CCU(‘(U) > max{g , 3^ (j_g)g^|{CC(‘(c/)}. 

In this work, we attempt to identify a setting under which the above lower bound can be 
matched. If the set of functions r(g') = {/ | {f,g) G X} is not sufficiently informative about g, 
then it seems hard to conceive of settings where Alice can do non-trivially well. We thus pick a 
simple and natural restriction on r(gf), namely, that it contains functions that are close to g (in 
(5^-distance). This leads us to our main target classes. For parameters k,e,6 > 0, define the sets of 
pairs of functions 

^k,e,s = {(/,5) I 6^{f,g) < 6 k CC^(/),CC^(5) < k} 

and 

^ {if,g) I 5^if,g) < 6 k owCC^(/), owCC^(5) < k}. 
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In words, Fk,e,5 (resp. owFk^e^s) considers all possible functions g with communication complexity 
(resp. one-way communication complexity) at most k with Alice being roughly under all possible 
uncertainties within distance 6 of Bob.^ 

It is clear that owCC^(owJ^fc , 5 ) > k. Our first main result, Theorem 1.1, gives an upper bound 
on this quantity, which in the particular case of product distributions is comparable to k (up to 
a constant factor increase in the error and communication complexity). In Theorem 3.2 we show 
that a naive strategy that attempts to reduce the uncertain communication problem to a “function 
agreement problem” (where Alice and Bob agree on a function q that is close to / and g and then 
use a protocol for q) cannot work. Furthermore, our second main result, Theorem 1.3, shows that 
for general non-product distributions, CC'^{owFk,e,5) can be much larger than k. More precisely, 
we construct a function class along with a distribution g. for which the one-way communication 
complexity in the standard model is a single bit whereas, under contextual uncertainty, the two-way 
communication complexity is at least Q,{y/n)\ 

3 Hardness of Contextual Agreement 

In this section, we show that even if both / and g have small one-way distributional communication 
complexity on some distribution g, agreeing on a such that 6^{q, f) is small takes communication 
that is roughly the size of the bit representation of / (which is exponential in the size of the input). 
Thus, agreeing on q before simulating a protocol for q is exponentially costlier than even the trivial 
protocol where Alice sends her input x to Bob. Formally, we consider the following communication 
problem: 

Definition 3.1 (AGREE 5 ,y(J-')). For a family of pairs of functions F F {f: X xY —>■ {0,1}}^, the 
F-agreement problem with parameters <5 ,7 > 0 is the communication problem where Alice gets f 
and Bob gets g such that {f,g) G F and their goal is for Alice to output qA and Bob to output qs 
such that d{qA,f),6{qB,g) < S and Fr[qA = qs] > 7- 

Somewhat abusing notation, we will use Agree 5 ..|,(P) to denote the distributional problem 
where P is a distribution on {f: X x Y —>■ {0,1}}^ and the goal now is to get agreement with 
probability 6 over the randomness of the protocol and the input. 

If the agreement problem could be solved with low communication for the family Tfc,< 5 ,e as 
defined at the end of Section 2, then this would turn into a natural protocol for CCU(J-fc^e'^ 5 ') for 
some positive e' and 5' as well. Our following theorem proves that agreement is a huge overkill. 

Theorem 3.2. For every S, 62 > 0, there exists a > 0 and a family F C To, 0,5 such that for every 
'y > 0, it holds that CC(Agree 52 , 7 (T)) > Q:|y| — log(l/ 7 ). 

In words. Theorem 3.2 says that there is a family of pairs of functions supported on functions 
of zero communication complexity (with zero error) for which agreement takes communication 
polynomial in the size of the domain of the functions. Note that this is exponentially larger than 
the trivial communication complexity for any function g, which is at most min{l -|-log |y|, log |Ai|}. 

We stress that while an agreement lower bound for zero communication functions may feel a 
lower bound for a toy problem, a lower bound for this setting is inherent in any separation between 

^For the sake of symmetry, we insist that CC^(/) < k (resp. owCCf‘(/) < k). We need not have insisted on it 
but since the other conditions anyhow imply that < k (resp. owCC((,_j(/) < k), we decided to include this 

stronger condition for aesthetic reasons. 
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agreement complexity for T and communication complexity with uncertainty for T. To see this, 
note that given any input to the CCU(J-') problem, Alice and Bob can execute any protocol for 
CCU(J-') pinning down the value of the function to be computed with high probability and low 
communication. If one considers the remaining challenge to agreement, it comes from a zero 
communication problem. 

Our proof of Theorem 3.2 uses a lower bound on the communication complexity of agreement 
distillation (with imperfectly shared randomness) problem defined in [CGMS15], who in turn rely 
on a lower bound for randomness extraction from correlated sources due to Bogdanov and Mos- 
sel [BMll]. 

We describe their problem below and the result that we use. We note that their context is 
slightly different and our description below is a reformulation. First, we define the notion of p- 
perturbed sequences of bits. A pair of bits (a, b) is said to be a pair of p-perturbed uniform bits if 
a is uniform over {0,1}, and b = a with probability 1 — p and b ^ a with probability p. A pair of 
sequences of bits (r, s) is said to be p-perturbed if r = (ri,..., r„) and s = (si,..., s„) and each 
coordinate pair (rj, Sj) is a p-perturbed uniform pair drawn independently of all other pairs. For a 
random variable W, we define its min-entropy as Hoo{w) = min^gsLipp(iy){—log(Pr[VF = m]}. 

Definition 3.3 (Agreement-Distillation^ p). In this problem, Alice and Boh get as inputs r and 
s, where (r, s) form a p-perturbed sequence of hits. Their goal is to communicate deterministically 
and produce as outputs wa (Alice’s output) and wb (Bob’s output) with the following properties: 
(i) Hoo{wa),Hoo{wb) > k and (ii) Pr(r_5)[u;A = wb] > 7- 

Lemma 3.4 ([CGMS15, Theorem 2]). For every p > 0, there exists e > 0 such that for every 
k and j, it holds that every deterministic protocol II that computes Agree^ ^ has communication 
complexity at least ek — log I/ 7 . 

We note that while the agreement distillation problem is very similar to our agreement problem, 
there are some syntactic differences. We are considering pairs of functions with low communica¬ 
tion complexity, whereas the agreement-distillation problem considers arbitrary random sequences. 
Also, our output criterion is proximity to the input functions, whereas in the agreement-distillation 
problem, we need to produce high-entropy outputs. Finally, we want a lower bound for our agree¬ 
ment problem when Alice and Bob are allowed to share perfect randomness while the agreement- 
distillation bound only holds for deterministic protocols. Nevertheless, we are able to reduce to 
their setting quite easily as we will see shortly. 

Our proof of Theorem 3.2 uses the standard Chernoff-Hoeffding tail inequality on random 
variables that we include below. Denote exp(x) = e®, where e is the base of the natural logarithm. 

Proposition 3.5 (Chernoff bound). Let X = ® of identically distributed inde¬ 
pendent random variables Xi,..., Xn G {0,1}. Let p = E[A] = holds that for 

6 G (0,1), 

Pr[A < (1 — 6)p] < exp (—(5^p/2) 

and 

Pr[A > (1 -I- 5)p\ < exp (—(5^p/3) , 

and for a > 0 , 

Pr[X > p -|- a] < exp(—2a^/n) 
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Proof of Theorem 3.2. We prove the theorem for a < 5/6, in which case we may assume 7 > 
exp(—5|y|/6) since otherwise the right-hand side is non-positive. 

Let Fb denote the set of functions that depend only on Bob’s inputs, i.e., / G Fb if there exists 
f':Y^ {0,1} such that f{x,y) = f'{y) for all x,y. Our family F will be a subset of Fb x Fb, 
the subset that contains functions that are at most 5|y| apart. 

-^={(/,5) G-Fbx J-B I 5(/,<7) <5}. 

It is clear that communication complexity of every function in the support of F is zero, with zero 
error (Bob can compute it on his own) and so C So it remains to prove a lower bound on 

CC(AGREE52,7(J^)). 

We prove our lower bound by picking a distribution Vp supported mostly on F and by giving a 
lower bound on CC(AGREE 52 , 7 (^^p)). Let p = 5/2. The distribution Vp is a simple one. It samples 
{f,g) as follows. The function / is drawn uniformly at random from Fb. Then, g is chosen to 
be a “/7-perturbation” of /, namely for every y ^Y, g'{y) is chosen to be equal to f{x,y) with 
probability I — p and 1 — f{x, y) with probability p. For every x G A, we now set g{x, y) = g'{x, y). 

By the Chernoff bound (see Proposition 3.5), we have that i^ifF) > ^] = exp(—/7 |y|/3 ) 

7 . So with overwhelmingly high probability, Vp draws elements from F. In particular, if some pro¬ 
tocol solves Agree52,7(T'), then it would also solve Agree52,27(^p)- 

We thus need to show a lower bound on the communication complexity of Agree52,27(F’p). 
We now note that since this is a distributional problem, by Yao’s min-max principle, if there is 
randomized protocol to solve Agree 52 , 27 (^p) with C bits of communication, then there is also 
a deterministic protocol for the same problem and with the same complexity. Thus, it suffices to 
lower bound the deterministic communication complexity of Agree52,27(^p)- Claim 3.6 shows that 
any such protocol gives a deterministic protocol for Agreement-Distillation with k = 052(1 Y|). 
Combining this with Lemma 3.4 gives us the desired lower bound on CC(Agree52,27(^p)) and 
hence on CC(Agree52,7(T')). □ 

Claim 3.6. Every protocol for Agree52,7(T>p) is also a protocol for Agreement-Distillation^ ^ 
for k = {1 — h{ 62 ))\Y\, where h{-) is the binary entropy function given by h{x) = —xlogx — (1 — 
x) log(l — x). 

Proof. Suppose Alice and Bob are trying to solve Agreement-Distillation^ p. They can sample 
p-pertubed strings (r, s) G {0,and interpret them as functions f',g': Y —{0,1} or equiva¬ 
lently as functions (/, p) ~ Vp. They can now simulate the protocol for Agreed 7 ( 7 , p) and output 
QA and qb. By definition of Agree, we have qa = Qb with probability at least 7 . So it suffices 
to show that Hoo{qA), Hoo{qB) > k- But this is obvious since any function qA is output only if 
^{fFA) < 52 and we have that |{/ | S{f,qA) < 5}| < Since the probability of sampling 

/ for any / is at most we have that the probability of outputting qA for any qA is at most 

2 “(i~^(^ 2 ))|t|_ J ]7 other words, Hoo{qA) > (1 — h{ 62 ))\Y\. Similarly, we can lower bound P^ooiqB) 
and thus we have that the outputs of the protocol for Agree solve Agreement-Distillation 
with k = {1 — h{ 62 ))\Y\. □ 

4 One-way Communication with Contextual Uncertainty 

In this section, we prove Theorem 1.1. We start with a high-level description of the protocol. 


4.1 Overview of Protocol 

Let ;U be a distribution over an input space X x Y. For any function s: X x Y ^ {0,1} and any 
X G X, we define the restriction of s to x to be the function s^'-Y^ {0,1} given by Sx{y) = s{x, y) 
for any y gY . 

We now give a high-level overview of the protocol. First, we consider the particular case 
of Theorem 1.1 where /r is a product distribution, i.e., /r = /ix x Note that in this case, 
I(X]Y) = 0 in the right-hand side of Equation (1). We will handle the case of general (not 
necessarily product) distributions later on. 

The general idea is that given inputs (/, x), Alice can determine the restriction and she will 
try to describe it to Bob. For most values of x, fx will be close (in -distance) to the function 
gx- Bob will try to use the (yet unspecified) description given by Alice in order to determine some 
function B that is close to gx- If he succeeds in doing so, he can output B{y) which would equal 
gx{y) with high probability over y. 

We next explain how Alice will describe fx, and how Bob will determine some function B that 
is close to gx based on Alice’s description. For the hrst part, we let Alice and Bob use shared 
randomness in order to sample yi,... ,ym, where the y^’s are drawn independently with yi ^ yy, 
and m is a parameter to be chosen later. Alice’s description of fx will then be (fxivi), ■ ■ ■, fxiym)) G 
{0,1}™. Thus, the length of the communication is m bits and we need to show that setting m to be 
roughly 0{k) suffices. Before we explain this, we first need to specify what Bob does with Alice’s 
message. 

As a hrst cut, let us consider the following natural strategy: Bob picks an x G A such that g^ 
is close to fx on yi,, y^, and sets B = g^. It is clear that if x = x, then B = gx = gx, and for 
every y G yy, we would have B{y) = gx{y)- Moreover, if x is such that gx is close to gx (which is 
itself close to fx), then B[y) would now equal gx{y) with high probability. It remains to deal with 
X such that g^ is far from gx- Note that if we hrst hx any such x and then sample yi,..., ym, then 
with high probability, we would reveal that gx is far from gx- This is because gx is close to fx, so 
gx should also be far from fx- However, this idea alone cannot deal with all possible x — using 
a naive union bound over all possible x G X would require a failure probability of 1/|A|, which 
would itself require setting m to be roughly log |A|. Indeed, smaller values of m should not suffice 
since we have not yet used the fact that CC^{g) < k — but we do so next. 

Suppose that H is a one-way protocol with k bits of communication. Then, note that Alice’s 
message partitions X into 2^ sets, one corresponding to each message. Our modihed strategy for 
Bob is to let him pick a representative x from each set in this partition, and then set B = g^ for 
an X among the representatives for which g^ and / are the closest on the samples yi,... ,ym- A 
simple analysis shows that the gx's that he inside the same set in this partition are close, and thus, 
if we pick x to be the representative of the set containing x, then g^ and fx will be close on the 
sampled points. For an other representative, once again if g^ is close to gx, then gx{y) will equal 
9 x{y) with high probability. For a representative x such that g^ is far from gx (which is itself close 
to fx), we can proceed as in the previous paragraph, and now the union bound works out since the 
total number of representatives is only 2^.^ 

We now turn to the case of general (not necessarily product) distributions. In this case, we 
would like to run the above protocol with yi,y 2 , - - - ,Vm sampled independently from g-y^x (instead 

®We note that a similar idea was used in a somewhat different context by [BJKS02] (following on [KNR99]) in 
order to characterize one-way communication complexity of any function under product distributions in terms of its 
VC-dimension. 
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of ^y)- Note that Alice knows x and hence knows the distribution ^y\x- Unfortunately, Bob does 
not know ^y\xj he only knows hy as a “proxy” for fJ,Y\x- While Alice and Bob cannot jointly sample 
such yj’s without communicating (as in the product case), they can still run the correlated sampling 
protocol of [BRll] in order to agree on such samples while communicating at most 0{m ■ I(X]Y)) 
bits. The original correlated sampling procedure of [BRll] inherently used multiple rounds of 
communication, but we are able in our case to turn it into a one-way protocol by leveraging the 
fact that our setup is distributional (see Subsection 4.2 for more details). 

The outline of the rest of this section is the following. In Subsection 4.2, we describe the 
properties of the correlated sampling procedure that we will use. In Subsection 4.3, we give the 
formal proof of Theorem 1.1. 

4.2 Correlated Sampling 

We start by recalling two standard notions from information theory. Given two disributions P and 
Q, the KL divergence between P and Q is defined as D{P\\Q) = Etj..^p[log(P(u)/(5(u))]. Given a 
joint distribution /r of a pair {X,Y) of random variables with fix and /iy being the marginals of 
over X and Y respectively, the mutual information of X and Y is defined as I{X-,Y) = D{fi\\fixfJ-Y)- 
The following lemma summarizes the properties of the correlated sampling protocol of [BRll]. 

Lemma 4.1 ([BRll]). Let Alice be given a distribution P and Bob be given a distribution Q over 
a common universe U. There is an interactive public-coin protocol that uses an expected 

D[P\\Q) + 21og(l/e) + 0{^D[P\\Q) + 1) 

bits of communication such that at the end of the protocol: 

• Alice outputs an element a distributed according to P. 

• Bob outputs an element b such that for each u £U, Pr[6 = u \ a = n]>l — e. 

Moreover, the message that Bob sends to Alice in any given round consists of a single bit indicating 
if the protocol should terminate or if Alice should send the next message. 

We point out that in general, the correlated sampling procedure in Lemma 4.1 can take more 
than one round of communication. This is because initially, neither Alice nor Bob knows D{P\\Q) 
and they will need to interactively “discover” it. In our case, we will be using correlated sampling 
in a “distributional setup”. It turns out that this allows us to use a one-way version of correlated 
sampling which is described in Lemma 4.2 below. 

Lemma 4.2. Let pL be a distribution over {x,y) with marginal fix over x, and assume that fi is 
known to both Alice and Bob. Fix e > 0 and let Alice be given x ~ fix- There is a one-way 
public-coin protocol that uses at most 

0{m ■L{X-,Y)/e -T log(l/e)/e) 

bits of communication such that with probability at least 1 — e over the public coins of the protocol 
and the randomness of x, Alice and Bob agree on m samples yi,y 2 , ■ ■ ■ iVm i-i-d ~ fi(Y\x) at the 
end of the protocol. 


10 



Proof. When x is Alice’s input, we can consider running the protocol in Lemma 4.1 on the distribu¬ 
tions P = Q = rii^i with error parameter e/2. Let 11 be the resulting 

protocol transcript. The expected communication cost of 11 is at most 


E,^^^[0{D{P\\Q)) + 0(log(l/e))] = 0{E,^^^[D{P\\Q)]) + 0(log(l/e)) 

= 0(m./(X;y))+0(log(l/e)), (2) 


where the last equality follows from the fact that 


E,^^^[D{P\\Q)] 


E 




E, 


'y-i_\x,...,ym\x 


log 


E®. 

i=l 

m 


'x^y,x 


E, 


•yi\x,...,ym\x 


log 


n™ 1 


yivi 




2=1 


E, 


'yi\^ 


log 


y{yi\x) 

y{yi) 


m 

'y^,^(x,y)^tJ. 

2=1 



m-I{X-,Y). 


( 

V y{y) ) 


By Markov’s inequality applied to (2), we get that with probability at least 1 — e/2, the length of 
the transcript LI is at most 

I{X- Y)/e) + 0(log(l/e)/e). 


Conditioned on the event E that the length of IT is at most ^ bits, the total number of bits sent 
by Alice to Bob is also at most 

Note that Lemma 4.1 guarantees that each message of Bob in LI consists of a single bit indicating 
if the protocol should terminate or if Alice should send the next message. Hence, Bob’s messages 
do not influence the actual bits sent by Alice; they only determine how many bits are sent by her. 

In the new one-way protocol H', Alice sends to Bob, in a single shot, the first ^ bits that she 
would have sent him in protocol H if he kept refusing to terminate. Upon receiving this message. 
Bob completes the simulation of protocol H. The error probability of the new protocol H' is the 
probability that either Alice did not send enough bits or that the protocol H makes an error, which 
by a union bound is at most 

Pr[U] -b e/2 < e/2 -b e/2 = e 

where E denotes the complement of event E. □ 


4.3 Proof of Theorem 1.1 

Recall that in the contextual setting, Alice’s input is (/, x) and Bob’s input is {g,y), where (/, g) G 
o\nF^^ ^ and (x, y) ~ /x. Let H be the one-way protocol for g in the standard setting that shows that 
owCC/( 5 ) < k. Note that H can be described by an integer L <2^ and functions tt: A —)• [L] and 
{Bi'. Y —>■ {0, l}}j6[L]) such that Alice’s message on input x is '/r(x), and Bob’s output on message 
i from Alice and on input y is Bi(y). We use this notation below. We also set the parameter 
m = 0(c(A:-b log(l/0))/0^), which is chosen such that 2^ • exp(—0^m/75) < 20/5. 
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The protocol. Algorithm 1 describes the protocol we employ in the contextual setting. Roughly 
speaking, the protocol works as follows. First, Alice and Bob run the one-way correlated sampling 
procedure given by Lemma 4.2 in order to sample yi,y 2 ■ ■ ■ ■,ym i-i.d. ~ yY\x- Then, Alice sends 
the sequence (fxiyi), ■ ■ ■, fxiym)) to Bob. Bob enumerates over i E [L] and counts the fraction 
of z E {yi, .. . , 2 /m} for which Bi{z) 7 ^ fxiz)- For the index i which minimizes this fraction. Bob 
outputs Bi{y) and halts. 

Algorithm 1: The protocol that handles contextual uncertainty 
The setting: Let 2 ^ be a probability distribution over a message space X xY. Alice and 
Bob are given functions / and g, and inputs x and y, respectively, where (/,</) E 
and (x, y) ~ /x. 

The protocol: 

1 . Alice and Bob run one-way correlated sampling with error parameter set to (0/10)^ in order 
to sample m values Z = { 2 / 1 , 2 / 2 , • • • j 2/m} ^ Y each sampled independently according to yY\x- 

2 . Alice sends {/x( 2 /i)}ieH to Bob. 

3. For every z E [L], Bob computes err* = ^ YlJLi t{Bi{yj) / fxivj))- Let 
*min — arg™foie[L]{®''L}- Bob outputs Bi^.^{y) and halts. 


Analysis. Observe that by Lemma 4.2, the correlated sampling procedure requires 0{m-I{X] Y)/6‘^+ 
log{l / 9) / O'^) bits of communication. Thus, the total communication of our protocol is at most 

0{m ■ /(A; Y)/9^ + log{l/e)/e^) + m = ^ bits 

for some absolute constant c, as promised. The next lemma establishes the correctness of the 
protocol. 

Lemma 4.3. Prn,(x,/ 9ix, 2 /)] <e + 2d + 9. 

Proof. We start with some notation. For x e X, let 4 = S^Y^^{fx,gx) and let 

Note that by definition, 6 = and e = ^xr.^ii.x[^x]- For i E [L], let 7 *, 3 , = Bi). 

Note that by the triangle inequality, 


1-k{x),x ~ ^ + ^x- (3) 

In what follows, we will analyze the probability that Bi^^^{y) 7 ^ g{x, y) by analyzing the estimate 
errj and the index computed in the above protocol. Note that err* = errj(x) computed above 

attempts to estimate and that both err* and Zmin are functions of x. 

Note that Lemma 4.2 guarantees that correlated sampling succeeds with probability at least 
1 — 0^/100. Henceforth, we condition on the event that correlated sampling succeeds (we will 
account for the event where this does happen at the end). By the Chernoff bound, we have for 
every x and i E [L] 


Pr 


l7i,x - erril > - 
5 


< exp 


9^ ■ m\ 
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By a union bound, we have for every x & X, 


yi, 


Pr 

,ym^y.Y\x 


3i G [L] s.t. - err^l > 


< L ■ exp ( — 


02. 


m 


75 


< 


26 


where the last inequality follows from our choice of m = 0(c • (A: + log(l/0))/02^. 

Now assume that for all z G [L], we have that \^i^x — errj| < 0/5, which we refer to below as the 
“Good Event”. Then, for Zmim we have 


7*,nin,x < erri^.^ + 0/5 

< err^(,^) + 0/5 

< 1 - k { x),x + 20/5 

< (Ja; + Cj; + 20/5 


(since we assumed the Good Event) 
(by definition of Zmin) 
(since we assumed the Good Event) 
(By Equation (3)) 


Let W C {0, !}”■ be the set of all x for which correlated sampling succeeds with probablity at least 
1 — 0/10 (over the internal randomness of the protocol). By Lemma 4.2 and an averaging argument, 
^ — ^/lO- Thus, 


Pr \Bi 

n,(a:,3/)~At 


XV) + X{x^y)\ < ^xr^t^xlxeW 


„ Pr X f{x,y)] 


+ 0/10 


< 


E 


Xr^fiX\ xGW 


Pr 

yi,---,ym,y^yY\x 


[Biminiy) + 


- ^x^^lx\x&W [7+in,a;] +0/5 
^ ^x^pLx\x&v\px + £a;] + 30/5 

+ ^x^fix [*^ 3 ; + (^xl + 0 

= (5 + e + 0 


+ 0/5 


where the third inequality follows from the fact that the Good Event occurs with probability at 
least 1 — 20/5, and from the corresponding upper bound on The other inequalities above 

follow from the definition of the set W and the fact that Pra;^^j^[x ^ W] < 0/10. Einally, since 
< S, we have that Bob’s output does not equal g{x,y) (which is the desired output) with 
probability at most e + 2(5 + 0. □ 


5 Lower Bound for Non-Product Distributions 

In this section, we prove Theorem 1.3. We start by defining the class of function pairs and 
distributions that will be used. Gonsider the parity functions on subsets of bits of the string 
X © ?/ G {0,1}”. Specifically, for every S C [n], let fs- {0,1}” x {0,1}” {0,1} be defined as 

fs{x,y) = ®iesixi © yi). Let q = q{n) > 0 and define 

F,^{{fsjT)-.\S/\T\<q-n}. (4) 

Next, we define a probability distribution Hp on {0,1}” x {0,1}” where p = p{n). We do so by 
giving a procedure to sample according to pp. To sample a pair (x, y) ~ pp, we draw x G_r {0,1}” 
and let y be a p-noisy copy of x, i.e., y ~ Np{x). Here, Np{x) is the distribution on {0,1}” that 
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outputs y G {0,1}" such that, independently, for each i £ [n], yi = 1 — Xi with probability p, and 
yi = Xi with probability 1 — p. In other words, fj,p{x,y) = 2“”' • p\^®y\ ■ (1 — for every 

{x,y) G {0, !}"■ X {0, !}"■. Here, \z\ denotes the Hamming weight of z, for any 2 G {0, !}"■. 

We will prove Lemmas 5.1 and 5.2 below about the function class Tq and the distribution pp. 

Lemma 5.1. For any p = p{n) and q = q{n), it holds that Fq C 

In words. Lemma 5.1 says that any pair of functions in Fq are {pqn)-close in (J^^-distance, and 
any function in Fq has a one-way zero-error protocol with a single bit of communication. Lemma 5.2 
lower bounds the contextually uncertain communication complexity of Fq under distribution pp. 

Lemma 5.2. For any p = p{n), q = q{n) and e > 0, it holds that: 

ecu {Fq) > 7 • min{p • n, {q/2) ■ n} - log(l/e) p, 

2 

where p = and p > 0 is an absolute constant. 

Note that applying Lemmas 5.1 and 5.2 with F = Fq, p = pp and p = q = \fdjn (where 5 > 0 
is any constant) implies Theorem 1.3. 

In Subsection 5.1 below, we prove Lemma 5.1 which follows from two simple propositions. The 
main part of the rest of this section is dedicated to the proof of Lemma 5.2. The idea behind the 
proof of Lemma 5.2 is to reduce the problem of computing Fq under pp with contextual uncertainty, 
into the problem of computing a related function in the standard distributional communication com¬ 
plexity model (i.e., without uncertainty) under a related distribution. We then use the discrepancy 
method to prove a lower bound on the communication complexity of the new problem. This task 
itself reduces to upper bounding the spectral norm of a certain communication matrix. The choice 
of our underlying distribution then implies a tensor structure for this matrix, which reduces the 
spectral norm computation to bounding the largest singular value of an explicit family of 4 x 4 
matrices. 

We point out that our lower bound in Lemma 5.2 is essentially tight up to a logarithmic factor. 
Namely, one can show using a simple one-way hashing protocol that for any constant e > 0, 
owCCU((^(J-'g) < 0 (r logr) with r = min{ 2 p ■ n,q ■ n}. 

5.1 Proof of Lemma 5.1 

Lemma 5.1 follows from Propositions 5.3 and 5.4 below. We first show that every pair of functions 
in Fq are close under the distribution pp. 

Proposition 5.3. For every {f,g) G Fq, it holds that dpj^{f,g) < pqn. 

Proof. Any pair of functions (/, g) G Fq is of the form f = fs and g = fr with IS'ATl < q. Hence, 
Pr [f{x,y)j^g{x,y)]= Pr [/ 5 Ar(a; © 2 /) = 1 ] < 1 - (1 - < 1 - (1 - p)''” < pgn. 


□ 

Next, we show that there is a simple one-way communication protocol that allows Alice and 
Bob to compute fs (for any S C [n]) with just a single bit of communication. 
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Proposition 5.4. owCC{ fs) = 1. 

Proof. Recall that fs{x,y) = ®ies{xi © Vi). We write this as fs{x,y) = i®i<=s{xi)) © (©ies(2/i))- 
This leads to the simple one-way protocol where Alice computes b = ©ig 5 (xj) and sends the single 
bit result of the computation to Bob. Bob can now compute 6© {(Bi£s{yi)) = fs{x,y) to obtain 
the value of fs (with zero error). □ 

5.2 Proof of Lemma 5.2 

In order to lower bound i^q)^ we define a communication problem in the standard distri- 

butional complexity setting that can be reduced to the problem of computing J^q under contextual 
uncertainty. The lower bound in Lemma 5.2 is then obtained by proving a lower bound on the 
communication complexity of the new problem which is defined as follows: 

• Inputs: Alice’s input is a pair {S,x) where S C [n] and x G {0,1}"'. Bob’s input is a pair 
(T, y) such that T C [n] and y G (0,1}". 

• Distribution: Let Pq be a distribution on pairs of Boolean functions (/, g) on {0,1}" x {0,1}" 
defined by the following sampling procedure. To sample {f,g) ~ Vq, we pick a set 5 C [n] 
uniformly at random and set f = fs- We then pick T to be a (g/2)-noisy copy of S and set 
g = fr- The distribution on the inputs of Alice and Bob is then described by I'p^q = Pq x Hp-. 
we sample {x,y) ~ fip and sample {f,g) ~ Vq. 

• Function: The goal is to compute the function F given by F{{S,x), {T,y)) = /t(x © y). 

Proposition 5.5 below - which follows from a simple Chernoff bound - shows that a protocol 
computing Fq under g,p can also be used to compute the function F in the standard distributional 
model with {{S,x), {T,y)) ~ Vp^q, and with the same amount of communication. 

Proposition 5.5. For every e > 0, it holds that CCU^’’ ^ with e' = 2 _ 

In the rest of this section, we will prove the following lower bound on (F), which along 

with Proposition 5.5, implies Lemma 5.2: 

Lemma 5.6. For every e > 0, it holds that 

(F) > 7 • min{p • n, iq/2) ■ n} — log(l/e), 

2 ~^ 

where 'j > 0 is an absolute constant. 

To prove Lemma 5.6, without loss of generality, we will set q = 2p and prove a lower bound of 
'y-p-n on the communication complexity^. So henceforth, we denote Up = Vp^ 2 p- The proof will use 
the discrepancy bound which is a well-known method for proving lower bounds on distributional 
communciation complexity in the standard model. 

^We can do so because CCffl (F) > with r = min(p, q/2), which follows from the fact that Alice can 

always use her private randomness to reduce the correlation between either {x,y) or (S, T). 
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Definition 5.7 (Discrepancy; [KN97]). Let F and Vp he as above and let R be any rectangle (i.e., 
any set of the form R = C x D where C,D C {0, Denote 


DisC,,(i?,F) 4 


Pr [F{{S, x), (T, y)) = 0, ((5, x), (T, y)) G R]- 

Up 

Pr [F{{S, x), {T, y)) = 1, {{S, x), {T, y)) E R] 


The discrepancy of F according to Up is DisCiyp(F) = max^^ 0180^,^(72, F) where the maximum is 
over all rectangles R. 


The next known proposition relates distributional communication complexity to discrepancy. 
Proposition 5.8 ([KN97]). For any e> D, it holds that CC^^_ (F) > log(2e/DisCi, (T)). 

We will prove the following lemma. 


Lemma 5.9. DiSCi^p^F) < 2 for some absolute constant 7 > 0. 


Note that Lemma 5.9 and Proposition 5.8 put together immediately imply Lemma 5.6. The 
proof of Lemma 5.9 uses some standard facts about the spectral properties of matrices and their 
tensor powers that we next recall. Let A E be a real square matrix. Then, u E is said to 
be an eigenvector of A with eigenvalue A E M if Av = An. If A is furthermore (symmetric) positive 
semi-definite, then all its eigenvalues are real and non-negative. We can now define the spectral 
norm of a (not necessarily symmetric) matrix. 

Definition 5.10. The spectral norm of a matrix A E is given by ||^|| = \/ XmaxiA'^A), where 
is the largest eigenvalue of A. 

Also, recall that given a matrix A E and a positive integer t, the tensor power matrix 

A®* E is defined by (A®*)(jj^ Aj for every {ii,... ,it) E {1,..., d}*. We will use 

the following standard fact which in particular says that the spectral norm is multiplicative with 
respect to tensoring. 

Fact 5.11. For any matrix A E vector u E R'^, scalar c E R and positive integer t, we have 


1. ||cA|| = |c| • ||A||. 

2 . ||A®*|| = ||A||L 

3. ||Au||2 < ||A|| • ||u||2, where for any vector w E R'^, denotes the Euclidean norm of w, 

II II ^ 2 

i.e., ||u ;||2 = 


The next lemma upper bounds the spectral norm of an explicit family of 4 x 4 matrices that 
will be used in the proof of Lemma 5.9. Looking ahead, it is crncial for onr pnrposes that the 
coefficient of a in the right-hand side of Equation (5) is a constant strictly smaller than 2. 

,2 -| 


Lemma 5.12. Let a E (0,1) be a real number and N = N{a) = 


1 


a 

„2 


a 

-a 2 

1 

—a 


—a 

a 

—a 

1 


Then, 


f- 2 

2^1+ v2 • a -|- a H—-—I—^. 

2 Y 2 


( 5 ) 
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Proof. One can verify that 


N^N = 


(a^ + 1 )^ 2 a(a^ + 1 ) 2 a(l — a^) 

2 a(a^ + 1 ) (a^ + 1 )^ 0 

2 a(l — a^) 0 (o^ + 1 )^ 

0 2 a(l - a^) - 2 a{a‘^ + 1 ) 


0 

2 a(l — a^) 
— 2 a{a?‘ + 1 ) 
(a 2 + 1)2 


Assuming that a G (0,1), one can also verify that N'^N has as eigenvectors 


V2(aWl) ■ 

1-a^ 
a^ + l 


r “^+1 1 

1—d^ 

\/2(a‘*+l) 

1-d^ 

, V2 = 

1-d^ 

1 


0 

0 


1 


with eigenvalue Ai(a) = 2a^ + + 2 a\/ 2 (o^ + 1 ) + 1 , 


and V 3 


V2(aWl) 1 

a"-l 

a^+l 


r “^+1 1 

l-a2 


a/sR+T) 

1— 

, U4 = 

a2-l 

1 


0 

0 


1 


with eigenvalue A 2 (a) := 2 a^+a^ 


2ay/2{a^ + 1 ) + 1 . 


Note that for any value of a G (0,1), the vectors vi, V 2 , V 3 and U 4 are linearly independent and 
each of the eigenvalues Ai(a) and A 2 (a) has multiplicity 2. Moreover, we have that Ai(a) > A 2 (a). 
Hence, 

\\N\\ = ^yXl{a) = \l 2a2 + + 2a\/2{a^ + 1) +T. 

Applying twice the fact that y/1 + x < 1 + x/2 for any x > — 1, we get that 


IIA^II = \/l + 2a2 + + 2aV2 

< 1 + Q,^ H—^—h Oj\f2\l 1 + 

2 r/ 

< 1 + a ^^—h a\/2(l H ——) 


/t: 2 

= 1 + av 2 + a H—-—I— j=. 

2 ^/2 


□ 


We are now ready to prove Lemma 5.9. 

Proof of Lemma 5.9: Fix any rectangle R = C x D where C,D C {0, We wish to show 
that DisCi/p (i?, F) < First, note that DisCi/p(ii, F) = where Ic and Id are the 

0/1 indicator vectors of C and D (respectively) and M is the 22"- x 2'^'^ real matrix defined by^ 

V((s.*),(T,!,)) = Mis.nix.y)) ■ 

— _!_n _ p)2ri/ ^\(T,i0y)/_?^.|S0r| + |3.©„| 

22n — p 

®We here use the symbols S and T to denote both subsets of [n] and the corresponding 0/1 indicator vectors. 
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for every S, x,T,y ^ {0,1}". Letting a = p/{l — p), we can write 


with N = N (a) being the 4x4 real matrix defined by® 

for all Si,xi,Ti,yi G {0,1}. Using the third property listed in Fact 5.11, we get 

Disc,^(4?,F) = \1cM1d\ < lllclb • ||M|| • ||lz ,||2 < ■ \\M\\ .V^ = 2^^ • ||M|| (7) 

We now use the first two properties listed in Fact 5.11 to relate ||M|| to ||A1|| as follows: 


\\M\\ = \\^{i - pr 




( 8 ) 


Using Equation ( 6 ), we can check that 

N = N{a) = 


1 a a —a? '' 
a 1 —a? 


a a 


a 

1 —a 


a —a 1 

Applying Lemma 5.12 with a = p/{l — p) and p sufficiently small (e.g., less than 1/10), we get 


<l + V2-(-^) + 0(p2). 

1 — p 

Combining Equations (7) to (9) above, we conclude that 

Disc.^(i?, F) < (1 - pf^ • (1 + V2 • (^) + 0(/)) 
(l-p)- (l+p-(V2-l) + 0(/)) 
l-p-(2-V2) + 0(/) 

for some absolute constant 7 > 0 . 


(9) 


□ 


®In Equation (6), Ti(xi © yi) denotes the product of the bit Ti and the bit (xi © yi). Moreover, since (Si © Ti) 
is a single bit, its Hamming weight |Si © Ul is the same as its bit-value, and similarly for (xi © j/i). 
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6 Conclusion and Future Directions 


In this work, we introduced and studied a simple model illustrating the role of context in commu¬ 
nication and the challenge posed by uncertainty of knowledge of context. 

On the technical side, it would be interesting to determine the correct exponent of I{X]Y) in 
Theorem 1.1. Theorems 1.1 and 1.3 imply that this exponent is between 1/2 and 1. 

It would also be interesting to prove an analogue of Theorem 1.1 for two-way protocols. Our 
proof of Theorem 1.1 uses in particular the fact that any low-communication one-way protocol in 
the standard distributional communication model should have a certain canonical form: to compute 
g{x,y), Alice tries to describe the entire function g{x,-) to Bob, and this does not create a huge 
overhead in communication. Coming up with a canonical form of two-way protocols that somehow 
changes gradually as we morph from g to f seems to be the essence of the challenge in extending 
Theorem 1.1 to the two-way setting. 

On the more conceptual side, arguably, the model considered in this work is a fairly realistic one: 
communication has some goals in mind which we model by letting Bob be interested in a specific 
function of the joint information that Alice and Bob possess. Moreover, it is a fairly natural model 
to posit that the two are not in perfect synchronization about the function that Bob is interested 
in, but Alice can estimate the function in some sense. One aspect of our model that can be further 
refined is the specific notion of distance that quantifies the gap between Bob’s function and Alice’s 
estimate. In this work, we chose the Hamming distance which forms a good first starting point. 
We believe that it is interesting to propose and study other models of distance between functions 
that more accurately capture natural forms of uncertainty. 

Finally, we wish to emphasize the mix of adversarial and probabilistic elements in our uncer¬ 
tainty model — the adversary picks {f,g) whereas the inputs {x,y) are picked from a distribution. 
We believe that richer mixtures of adversarial and probabilistic elements could lead to broader 
settings of modeling and coping with uncertainty — the probabilistic elements offer efficient possi¬ 
bilities that are often immediately ruled out by adversarial choices, whereas the adversarial elements 
prevent the probabilistic assumptions from being too precise. 
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